Skip to content

Selector validation #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 13, 2019

Conversation

raxbg
Copy link
Contributor

@raxbg raxbg commented Feb 25, 2019

The way we currently detect extra closing block brackets } is not perfect and still causes issues, because we just ignore them. This however does not match the browsers' behavior. Depending on the case the extra closing brackets might end up as part of the selector. Take a look at the following:

@keyframes mymove {
  from { top: 0px; }
}

#test {
  color: white;
  background: green;
}

body
  background: black;
  }

#test {
  display: block;
  background: red;
  color: white;
}
#test {
  display: block;
  background: white;
  color: black;
}

The body selector is missing an opening { so the selector string will be everything up to the next { found at the first #test { declaration. This makes the selector invalid, so the first definition of #test must be skipped, but parsing must continue. The final tree must include the @keyframes, first #test definition and the last #test definition. The rules in between must be skipped.

This is why I think we should have a selector validation and skip the rule sets for invalid selectors (which matches the browsers' behavior).

The modifications in this PR will change the output of the library, so this change might be considered BC breaking, but I believe it is a move in the right direction. I tried to build the selector validation according to the definition found in this document. More specifically this line In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_);. It is not perfect yet, because it doesn't check for escaped characters, or a valid start of the selector. I might try to add these if we agree that this is a viable approach to the problem.

Let me know what you think.

@MyIntervals MyIntervals deleted a comment May 8, 2019
@MyIntervals MyIntervals deleted a comment May 8, 2019
@MyIntervals MyIntervals deleted a comment May 8, 2019
@MyIntervals MyIntervals deleted a comment May 8, 2019
@MyIntervals MyIntervals deleted a comment May 8, 2019
@MyIntervals MyIntervals deleted a comment May 8, 2019
@MyIntervals MyIntervals deleted a comment May 8, 2019
@MyIntervals MyIntervals deleted a comment May 8, 2019
@sabberworm
Copy link
Collaborator

Thanks! I did start work on proper selector parsing some time ago but did not have time to finish it. So for now this will have to do. Does it cover cases like:

.this-selector [should="be-{"] .valid {
}

and

.this-selector /* should remain-} */ .valid {
}

?

Also see the ignored test case in https://github.com/sabberworm/PHP-CSS-Parser/blob/a90142fa0c664db18ea01948a61466bf1ff79220/tests/files/-tobedone.css#L1.

@raxbg
Copy link
Contributor Author

raxbg commented May 28, 2019

Just tested how Chrome behaves in these cases and the first example is actually invalid. The second one, however, should be valid but it is currently not. I will try to fix this soon.

@sabberworm
Copy link
Collaborator

sabberworm commented Jun 2, 2019

Just tested how Chrome behaves in these cases and the first example is actually invalid.

No it isn’t. See https://codepen.io/anon/pen/GazJYb

Works flawlessly on Firefox and Chrome.

@raxbg
Copy link
Contributor Author

raxbg commented Jul 12, 2019

Just pushed an update which resolves the pending issues. Please take another look.

Copy link
Collaborator

@sabberworm sabberworm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think this looks good but I do have some reservation about this huge regex that is too complicated. Could you maybe make that more readable?

public function __construct($sSelector, $bCalculateSpecificity = false) {
if (!Selector::isValid($sSelector)) {
throw new UnexpectedTokenException("Selector did not match '" . self::SELECTOR_VALIDATION_RX . "'.", $sSelector, "custom");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure it is the job of the Selector class to validate its input. I think it’s the job of the selector parsing logic (i.e. DeclarationBlock::parse) to do that.

@@ -35,10 +37,21 @@ class Selector {
))
/ix';

const SELECTOR_VALIDATION_RX = '/
^((?:[a-zA-Z0-9\x{00A0}-\x{FFFF}_\^\$\|\*\=\"\'\~\[\]\(\)\-\s\.:#\+\>]*(?:\\\\.)?(?:\'.*?\')?(?:\".*?\")?)*|\s*?[\+-]?\d+\%\s*)$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is too complicated to understand. If it can’t be simplified, maybe a regex isn’t the right tool for the job. But since the end goal for selectors still is a complete parser (which will make the regex obsolete), I’ll allow it for now. But maybe you could split the regex to multiple lines using concatenation and comment each line so we better understand what it does.

Also, some of the escapes are unnecessary. Inside character classes [], you only need to escape ] and - (in some cases) (and maybe [ to be symmetrical), but ^, $, |, *, =, ", ~, +, (, ) can be left literal (' was already literal since \' is a string escape, not a regex escape).

Copy link
Collaborator

@sabberworm sabberworm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me.

@sabberworm sabberworm merged commit 4ee910b into MyIntervals:master Jul 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants